\documentclass{vgtc}                          % final (conference style)
\usepackage{amsmath}
\usepackage{relsize}
%\documentclass[review]{vgtc}                 % review
%\documentclass[widereview]{vgtc}             % wide-spaced review
%\documentclass[preprint]{vgtc}               % preprint
%\documentclass[electronic]{vgtc}             % electronic version
\newif\ifGPblacktext
\GPblacktexttrue

%% Uncomment one of the lines above depending on where your paper is
%% in the conference process. ``review'' and ``widereview'' are for review
%% submission, ``preprint'' is for pre-publication, and the final version
%% doesn't use a specific qualifier. Further, ``electronic'' includes
%% hyperreferences for more convenient online viewing.

%% Please use one of the ``review'' options in combination with the
%% assigned online id (see below) ONLY if your paper uses a double blind
%% review process. Some conferences, like IEEE Vis and InfoVis, have NOT
%% in the past.

%% Figures should be in CMYK or Grey scale format, otherwise, colour 
%% shifting may occur during the printing process.

%% These three lines bring in essential packages: ``mathptmx'' for Type 1 
%% typefaces, ``graphicx'' for inclusion of EPS figures. and ``times''
%% for proper handling of the times font family.

\usepackage{bm}
\usepackage{mathptmx}
\usepackage{graphicx}
\usepackage{times}


\newcommand*{\SET}[1]  {\ensuremath{\mathcal{#1}}}
\newcommand*{\MAT}[1]  {\ensuremath{\mathbf{#1}}}
\newcommand*{\VEC}[1]  {\ensuremath{\bm{#1}}}
\newcommand*{\CONST}[1]{\ensuremath{\mathit{#1}}}
\newcommand*{\norm}[1]{\mathopen\| #1 \mathclose\|}% use instead of $\|x\|$
\newcommand*{\abs}[1]{\mathopen| #1 \mathclose|}% use instead of $\|x\|$
\newcommand*{\absLR}[1]{\left| #1 \right|}% use instead of $\|x\|$
\newcommand*{\normLR}[1]{\left\| #1 \right\|}% use instead of $\|x\|$

%% We encourage the use of mathptmx for consistent usage of times font
%% throughout the proceedings. However, if you encounter conflicts
%% with other math-related packages, you may want to disable it.

%% If you are submitting a paper to a conference for review with a double
%% blind reviewing process, please replace the value ``0'' below with your
%% OnlineID. Otherwise, you may safely leave it at ``0''.
\onlineid{0}

%% declare the category of your paper, only shown in review mode
\vgtccategory{Research}

%% allow for this line if you want the electronic option to work properly
\vgtcinsertpkg

%% In preprint mode you may define your own headline.
%\preprinttext{To appear in an IEEE VGTC sponsored conference.}

%% Paper title.

\title{SHREC'08 Entry: Shape Based Face Recognition with a Morphable Model}

%% This is how authors are specified in the conference style

%% Author and Affiliation (single author).
%%\author{Roy G. Biv\thanks{e-mail: roy.g.biv@aol.com}}
%%\affiliation{\scriptsize Allied Widgets Research}

%% Author and Affiliation (multiple authors with single affiliations).
%%\author{Roy G. Biv\thanks{e-mail: roy.g.biv@aol.com} %
%%\and Ed Grimley\thanks{e-mail:ed.grimley@aol.com} %
%%\and Martha Stewart\thanks{e-mail:martha.stewart@marthastewart.com}}
%%\affiliation{\scriptsize Martha Stewart Enterprises \\ Microsoft Research}

%% Author and Affiliation (multiple authors with multiple affiliations)
\author{Brian Amberg\thanks{e-mail: \{brian.amberg, reinhard.knothe, thomas.vetter\}@unibas.ch} %
        \and Reinhard Knothe$^*$%
	\and Thomas Vetter$^*$}%

%% Abstract section.
\abstract{
  We present a method for face recognition by fitting a 3D Morphable Model to
  shape data. Fitting is done with a a robust nonrigid ICP algorithm. For
  recognition, it is possible to use either the fitted model parameters, or the
  correspondences induced by the model. We compare different similarity measures,
  and show that a 3D Morphable Model allows very robust retrieval results.
} % end of abstract

%% ACM Computing Classification System (CCS). 
%% See <http://www.acm.org/class/1998/> for details.
%% The ``\CCScat'' command takes four arguments.

\CCScatlist{ 
  \CCScat{I.4.8}{Image Processing and Computer Vision}{Scene Analysis}{Object Recognition, Surface Fitting, Range Data}
  %\CCScat{I.4.7}{Image Processing and Computer Vision}{Feature Measurement}{Size and Shape, Feature representation}
  I.4.7: Feature Measurement---Size and Shape, Feature representation
%  \CCScat{I.4.9}{Image Processing and Computer Vision}{Applications}{}
  I.4.9: Applications
%  \CCScat{I.4.10}{Image Processing and Computer Vision}{Image Representation}{Statistical}
  \CCScat{I.5.1}{Pattern Recognition}{Models}{Statistical}
%  \CCScat{I.5.1}{}{Applications}{Computer Vision}
}

%% Copyright space is enabled by default as required by guidelines.
%% It is disabled by the 'review' option or via the following command:
% \nocopyrightspace

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%% START OF THE PAPER %%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{document}

%% The ``\maketitle'' command must be the first command after the
%% ``\begin{document}'' command. It prepares and prints the title block.

%% the only exception to this rule is the \firstsection command
\firstsection{Introduction}

\maketitle

We tackle the task of textureless 3D face recognition. The system is fully
automatic and can handle the typical artifacts of 3D scanners, namely outliers
and missing regions. Face recognition in this setting is a difficult task, and
difficult tasks need strong prior knowledge. To introduce the prior knowledge
we use a 3D Morphable Model (3DMM)~\cite{blanz:model}, which is a generative
statistical model of 3D faces. 3DMM have been applied successfully for face
recognition on different modalities. The most challenging setting is
recognition from single images under varying light and illumination.  This was
adressed by~\cite{blanz03:face_rec,romdhani:recognition}. There a 3DMM with
shape, texture and illumination model was fit to probe and gallery images.  As
the model separates shape and albedo parameters from pose and lighting, it
enables pose and lighting invariant recognition.  In~\cite{amberg07:stereo} a
similar approach was used to fit a pure shape model to stereo images, also
enabling recognition by correlating the shape parameters. We use the same
approach for shape based face recognition. We fit a 3DMM build from 170
subjects with neutral expressions to the \hbox{gavabDB}~\cite{gavabdb} database, and
compare different distance measures which can be derived from the model fit.

An alternative to fitting a generative model is to align the probe to each
example in the database using e.g.\ ICP~\cite{bowyer05:icp_recognition}. But
comparing the probe directly to every gallery image has the disadvantage of
scaling linearly with the number of entries in the gallery, while for a model
based approach only a single fit to the probe is necessary, and the comparision
to the database can then be performed by a distance measure in the lower
dimensional space of registered faces.

Another interesting model-less approach~\cite{bronstein05:face_rec} compares
surface by the distribution of geodesics, which stays constant for nonrigidly
deforming (but not stretching or tearing) objects.  This approach is difficult
to apply in this setting though, as the scanning produces holes, disconnected
regions and strong noise, which can best be handled by a method which uses
specific information about the object class.
			  
\section{Fitting}
\begin{figure}
  \vspace{-0.5em}
  \begin{tabular}{@{ }c@{ }c@{ }c@{ }c@{}}
    \includegraphics[height=0.35\linewidth]{tgt}&
    \includegraphics[height=0.35\linewidth]{src}&
    \includegraphics[height=0.35\linewidth]{step}&
    \includegraphics[height=0.35\linewidth]{extrapolated}\\[-0.8em]
    \smaller a) Target & \smaller b) Fit & \smaller (a) + (b) & \smaller c) Deformed
  \end{tabular}
  \vspace{-1em}
  \caption{The robust fitting gives a good estimate (b) of the true face surface given
  the noisy measurement (a). It fills in holes and removes artifacts using prior
  knowledge from the face model. The fitted shape plus the exact
  correspondences found can be used to extrapolate the image by a robust
  poisson deformation (c).}
  \label{fig:fitting}
\end{figure}
The fitting algorithm used in this paper is a variant of the nonrigid ICP work
in~\cite{amberg07:nicp}. It is a robust iterated fitting algorithm. Like other
ICP methods, it is a local optimization method, which does not guarantee
convergence to the global mimimum, but is dependent on the initialization. It
consists of the following steps
\begin{itemize}
  \item Iterate over a sequence of regularization values $\theta_1>\dots>\theta_N$:
  \begin{itemize}
  \item Repeat until convergence:
    \begin{enumerate}
    \item Find candidate correspondences by searching for the closest compatible
      point for each model vertex.
    \item Weight the correspondences by their distance using a robust estimator.
    \item Fit the 3DMM to these correspondences using a
      regularization strength of $\theta_i$\label{step_fit}.
    \item Continue with the lower $\theta_{i+1}$ if the median change in vertex
      position is smaller than a threshold.
    \end{enumerate}
  \end{itemize}
\end{itemize}
The search for the closest compatible point takes only points into account which
have conforming normals. 
%The search is sped up by organizing the target scan in a space partitioning tree made up of spheres.
%The robust weighting function used in this paper is
%\emph{TODO: Check, what I have really done. I kind of hacked it.}
%\begin{align}
%  w(r) = \left\{\begin{array}{ll} 1+\frac{m_1-r}{m_1}l_0 & \text{if }r<m_1\\\frac{1}{m_2(r-m_1)+1} &\text{otherwise}\end{array}\right.
%\end{align}
%where $m_1$ was set to $3$mm  and $m_2$ to $20$mm. This leads to a robust
%estimation, which discards outliers. 
Note, that it is necessary to balance robustness and regularization, as the
right balance depends on the noise characteristic of the data. Suitable values
were determined manually for a single scan and kept constant for all
experiments.  In step~\ref{step_fit} the 3DMM is fit to 3D-3D point
correspondences.  This is done with a gauss-newton least squares optimization,
using an analytic Jacobian and first-order Hessian.

As the database is pose normalized, we initialize the registration such that
the tip of the nose and pose coincides. This initialization is good enough to
fit the complete database fully automatic. For non pose-normalized databases,
we would either need three landmarks, or -- to keep the algorithm fully
automatic -- repeated random initialization.

\section{Retrieval}
We evaluated four different distance measures, see Figure~\ref{fig:precision} for an overview over their precision recall characteristics.

\subsection{Model Based Measures}
We begin with measures which are acting in the parameter space of the model.
These have the advantage of being extremely cheap to calculate, once the model
has been fit.
\begin{figure}
  \hspace{2.5em}\scalebox{0.68}{\input{precision_recall}}
  \vspace{-0.7em}
  \caption{Measures in Mahalanobis Space outperform vertex based measures.}
  \label{fig:precision}
\end{figure}
\begin{figure}
  \vspace{-1.0em}
  \hspace{2.5em}\scalebox{0.68}{\input{impostor}}
  \vspace{0.1em}
  \caption{Impostor detection is reliable, as the minimum distance to a match is smaller than the minimum distance to a nonmatch.}
  \label{fig:impostor}
\end{figure}

\subsubsection{Mahalanobis distance of shape coefficients}
%\paragraph{Mahalanobis distance of shape coefficients}
The first method calculates the distance between two vectors of shape
coefficients $\VEC \alpha_1$ and $\VEC\alpha_2$ expressed in Mahalanobis space
as
\begin{align}
  s_1(\VEC\alpha_1, \VEC\alpha_2) &= \norm{\VEC\alpha_1 - \VEC\alpha_2}\qquad.
\end{align}

\subsubsection{Angular distance of shape coefficients}
In face space, caricatures lie along the rays from the origin.
Mapping all caricatured versions of a face onto a canonical face gives a method
which has proven to have very high recognition rates~\cite{blanz03:face_rec}.
To do this we use the angle between the shape coefficients in Mahalanobis space
as the distance measure:
\begin{align}
  s_2(\VEC\alpha_1, \VEC\alpha_2) &= \arccos\left(\frac{\VEC\alpha_1^T\VEC\alpha_2}{\norm{\VEC\alpha_1}\norm{\VEC\alpha_2}}\right)\qquad.
\end{align}
For the angular measure shown in Figure~\ref{fig:impostor} the distribution of
distances towards the first match and the first nonmatch in the database. This
shows, that by choosing a suitable threshold it is possible to perform face
recognition with impostors, where we decide if the identity is in the database
or not.

These two measures give similar results, but the caricature invariance of the
angular distance improves recall a bit. This tells us, that in face space, the
direction alone codes the identity.

\subsection{Shape Based Measures}
The second type of measures acts in vertex space. If we want to compare
two model instances, it does not make much sense to measure in vertex space
instead of parameter space, as there is a one to one mapping between the spaces
and the parameter space is of much lower dimensionality. But in our case we do
have additional information which can not be expressed by the model.
After fitting, some residual will remain, which is caused by three reasons. 1)
The individuals used to train the model were not from the gallery, and we can
not span the complete face space with 170 training examples.  2) The probe
images have expressions, while the database was build using neutral expressions.
3) The aquisition process introduces noise. Therefore, we add more
flexibility to the model by allowing smooth nonrigid deformations of the final
fit to minimize the remaining residual. This is achieved by robustly fitting a
poisson deformation with soft boundary and zero right hand side, where the
boundary is given by the correspondences found by the fitting algorithm, and
the deformed shape is the fitted head.  Results can be seen in
Figure~\ref{fig:fitting}.  With these correspondence established we can use
different distance measures.

\subsubsection{Distance}
Denote the $N$ vertices of the registered scan $i$ as $\VEC v^i_1,\dots,\VEC
v^i_N$.  We use the distance after removing the rigid transformation, measuring
only in a mask defined on the model, which includes the parts which are visible
in most of the scans. We compute
\begin{align}
  s_3(\MAT v^1, \MAT v^2)  &= \min_{\MAT R,\VEC t} \sum_i \norm{\MAT R\VEC v^1_i + \VEC t - \VEC v^2_i}\qquad.
\end{align}
where $\MAT R,\VEC t$ describe a similarity transform.
%
While this measure is straight forward, the recognition results are
unsatisfying. The scaling of face space which is learned from the example faces
results in an improved clustering of scans, which enables better classification
than comparisions in the original vertex space.

\subsubsection{Geodesics}
Inspired by~\cite{bronstein05:face_rec} we tried a geodesic based measure,
which should be invariant against expression changes. We classify by comparing
the distances of a selected set of vertices, which were assumed not to change
under expressions.  Denote the selected vertex pairs by $\SET P$. Note that
this is different from~\cite{bronstein05:face_rec}, as we have already brought
the meshes into correspondence during fitting. The measure is then
\begin{align}
  s_4(\MAT v^1, \MAT v^2) &= \sum_{(i,j)\in \SET P} \absLR{ \norm{v^1_i - v^1_j} - \norm{v^2_i - v^2_j}}
\end{align}
Experiments with different sets of distances showed that the best
classification results were achieved by selecting all neighboring edges of the
model in the face area. Still, exactly because this method introduces some
invariance, it also reduces the precision in the retrieval experiments. Also,
it does not incorporate the knowledge about face space, which was exploited in
the first two methods.

\section{Conclusion}
We have shown that 3D Morphable Models provide a valuable tool for face
recognition with 98.8\% recognition rate on this database. The strong prior
knowledge allows robust handling of noisy data. Four distance measures
were compared, and it turns out that the angular distance in Mahalanobis space
is the most accurate classifier. It was noticably better than measuring
the difference between the registered scans. This is because the face space
learned from our training examples is constructed such that the identities are
better separated than in vertex space.

In the future we wish to make the system expression invariant, by using a
model which separates expression and identity. As we do establish
correspondence between the model and the scans, it is trivial to add image
based classification for datasets where a calibrated photo is available, by
comparing the rectified textures.

%% if specified like this the section will be ommitted in review mode
\acknowledgements{The authors wish to thank P.\ Paysan for the data
acquisition. This work was supported in part by a grant from Microsoft
Research and the Swiss National Science Foundation (200021-103814 and NCCR COME project 5005-66380).}

\bibliographystyle{abbrv}
%%use following if all content of bibtex file should be shown
%\nocite{*}
\bibliography{shrec_08}
\end{document}
